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ABSTRACT:  The  next  generation  earth-observation  satellites  will  have  significantly 
increased  performance  requirements.  New  advanced  compression  techniques  like  Bit- 
Plane  Encoding  and  transformation  steps  based  on  wavelets  are  gaining  importance. 
However,  due  to  the  small  size  of  the  space  electronics  market,  the  availability  of  devices 
capable  to  implement  such  algorithms  is  decreasing. 

This  has  motivated  EADS-Astrium  GmbH  to  search  new  processing  technologies  that 
can  be  transferred  in  the  short  tenn  to  reliable  commercial  space  technologies.  The 
emphasis  is  put  on  reconfigurable  processing,  since  this  is  the  only  way  to  reduce  risks 
and  costs  and  assure  the  proper  functionality  of  the  satellite  during  its  whole  life. 


The  extreme  Processing  Platform  is  a  new  runtime  reconfigurable  processor  technology, 
An  ESA  study,  with  the  name  “XPP  Applicability  Study”  was  already  carried  out  to 
prove  the  feasibility  of  this  new  technology  and  its  superiority  over  different  architectures 
being  offered  in  the  market.  An  important  part  of  this  study  was  also  the  transfer  of  the 
architecture  to  a  radiation-tolerant  semiconductor  technology. 


The  small  satellite  mission  BayernSAT  of  the  Technische  Universitat  Munchen  will  serve 
as  a  demonstration  of  the  image  processing  capabilities  of  a  new  reconfigurable 
processing  technology,  the  XPP,  integrated  in  a  configurable  processor  system  based  on 
the  LEON  Sparc  processor. 


1.  INTRODUCTION  AND  MOTIVATION 

The  computing  power  of  general  purpose  CPUs  is  not  sufficient  for  certain  applications, 
for  instance  real  time  video-  or  image-  compression.  The  situation  is  especially  difficult 
in  space,  where  the  harshness  of  the  environment  causes  the  majority  of  commercial 
COTS  (custom  off-the  shelf)  components  to  fail  in  the  short  time.  The  available 
radiation-hardened  general  purpose  processors  (e.g.  ERC32)  can  by  no  means  be  used  for 
data  processing  applications,  due  to  its  reduced  performance  (usually  only  some  tens  of 
MIPS). 

The  traditional  scheme  used  in  the  space  industry  up  to  now  is  to  accelerate  the 
computing  intensive  parts  of  the  algorithms  using  application  specific  integrated  circuits 
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(ASICs).  There  are  two  important  problems  associated  with  this  approach.  First  of  all,  the 
development  of  hardware  is  always  a  slow  process.  Several  years  of  development  are 
necessary  before  one  design  is  completed  delaying  missions  and  increasing  the  costs.  The 
second  problem  related  to  ASICs  is  the  lack  of  reconfigurability.  Reconfigurability  is 
defined  as  the  ability  of  a  circuit  to  change  its  functionality  by  reprogramming  part  or  the 
whole  circuit.  ASICs  can  not  be  reprogrammed  und  thus;  algorithm  modifications  due  to 
standard  updates  or  mission  requirement  changes,  error  corrections  and  other  similar 
changes  are  not  possible,  reducing  the  flexibility  of  the  whole  system. 

The  solution  is  to  implement  these  computing  intensive  parts  in  reconfigurable 
processors.  There  has  been  recently  a  lot  of  research  in  the  field  of  commercial 
configurable  processors  [1].  A  configurable  processor  is  a  single-chip  combination  of  a 
microprocessor  or  a  microcontroller,  programmable  logic,  memory  and  a  dedicated 
system  bus  [Figure  1].  While  the  programmable  logic  (reconfiguration  unit)  takes  care  of 
the  computing  intensive  parts  of  the  application,  the  microprocessor  is  used  for  the  rest  of 
the  calculations.  The  configurable  computing  architectures  are  based  on  two  different 
concepts,  according  to  the  level  of  abstraction  provided  by  the  programmable  hardware. 
Most  of  the  projects  nowadays  use  reconfigurable  fine-grained  FPGA  (Field- 
programmable  Gate  Array)  logic.  The  trend  is  however,  to  use  newer  architectures,  which 
are  based  on  coarse-grained  logic,  i.e.,  the  integration  of  several  complete  ALUs  or 
Multipliers.  The  XPP  (PACT)  is  one  of  these  new  architectures  and  will  be  discussed  in 
the  next  sections.  Although  there  has  been  a  lot  of  discussion  on  the  topic,  it  is  still 
unclear  which  architecture  delivers  the  best  results.  Their  perfonnance  is  strong  related 
on  the  kind  of  algorithms  used  in  the  final  application.  In  general,  a  coarse-grained 
processor  is  the  better  solution  when  handling  with  n-bits  data  words  and  a  FPGA  is  very 
effective  when  handling  with  single  bits. 


Figure  1:  Configurable  Processors 


The  BayemSAT  image  compression  system  is  based  on  all  the  fore  mentioned  concepts, 
integrating  technologies  suitable  for  its  use  in  space.  The  mission  requirements,  the 
processing  chain  used  and  the  proposed  architecture  with  special  emphasis  in  the  new 
XPP  reconfigurable  computing  technology  will  be  explained  along  the  next  sections. 

2.  IMAGE  COMPRESSION  SYSTEM  FOR  THE  BAYERNSAT 
MICROSATELLITE 

2.1.  A  Target  Satellite  Mission  :  Bayern  SAT 

BayernSAT  is  a  microsatellite  project  of  the  Institute  of  Astronautics  of  the  Technische 
Universitat  Munchen  [2]  [3],  which  serves  as  a  demonstrator  for  several  technologies. 
The  most  important  of  them  is  the  telepresence.  BayemSAT  uses  a  relay  satellite  to 
enlarge  the  communication  window  [Figure  2],  Pictures  of  the  earth  are  taken  on-board  as 
selected  from  a  user.  These  pictures  will  be  transmitted  using  the  extended  coverage 
enabled  by  the  relay  satellite  to  the  end  user  in  less  than  a  second  to  demonstrate  the 
feasibility  of  telepresence. 
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Figure  2:  Communication  Architecture  in  BayernSAT  [3] 


The  raw  data  taken  by  the  camera  must  be  transmitted  with  the  minimum  possible  delay 
using  a  Ka-Band  high-gain  link,  providing  less  than  512  Kbps  for  the  transmission  of 
information.  The  reduction  of  data  transmission  in  real  time  without  disturbing  the 
quality  of  the  captured  pictures  is,  therefore,  a  topic  of  the  greatest  importance.  For  this 
purpose,  new  powerful  processing  technologies  and  efficient  compression  techniques 
must  be  used. 

2.2.  An  Image  Compression  Algorithm:  The  CCSDS  Standard 

The  new  CCSDS  Recommendation  [4]  for  image  data  compression  describes  a  technique 
for  a  data-compression  algorithm  applied  to  digital  data  from  payload  instruments  and 
specifies  how  these  compressed  data  will  be  inserted  into  source  packets. 

This  standard  is  similar  to  the  commercial  JPEG2000,  but  it  specifically  targets  use  on¬ 
board  the  spacecraft  and  being  less  complex  can  be  fully  implementable  either  in 
hardware  or  in  software. 

The  compressor  consists  of  two  functional  parts,  a  Discrete  Wavelet  Transform  which 
performs  decorrelation  of  the  input  pixels  and  a  Bit-Plane  encoder,  which  encodes  the 
decorrelated  data,  as  shown  in  Figure  3. 


Figure  3:  General  Schematic  of  the  Coder 


2.3.  A  New  Processing  Architecture:  XPP-LEON 

The  requirements  of  the  target  satellite  mission  and  the  algorithm  calculations  have  been 
established.  The  next  step  is  to  define  a  suitable  architecture  for  the  image  processing 
implementation.  The  use  of  reconfigurable  computing  technologies  enable  the  use  of  such 
a  processing  architecture  for  a  different  mission  or  application  without  having  to  invest 
much  time  in  new  development. 

As  it  was  said  in  section  1,  a  configurable  processor  is  based  on  a  combination  of  a 
microcontroller,  configurable  logic,  memory  and  a  dedicated  system  bus.  In  the  European 
space  community,  there  is  already  an  existing  successful  platform  for  embedded  systems, 
the  LEON  SPARC  V8  processor  (Gaisler  Research)  [5].  This  architecture  is  based  on  a 
AMBA  system  bus  [6].  The  processing  power  of  the  LEON  does  not  enable  a  complete 
real-time  implementation  of  the  Bit-Plane  encoder  algorithm  in  software.  For  this 


purpose,  two  different  reconfigurable  units  have  been  defined,  one  of  them  is  FPGA- 
based,  the  second  one  is  a  reconfigurable  processor  array. 

The  Bit-Plane  Encoder,  which  perforins  mostly  bitwise  operations,  shall  be  implemented 
in  the  FPGA  logic.  The  DWT  5/3  Transformation,  which  perfonns  operations  on  8-16 
bits  pixels  will  be  implemented  on  the  XPP  reconfigurable  processor,  whose  performance 
for  this  algorithm  has  been  already  demonstrated  [7]. 


Figure  4:  BayernSAT  Image  Processing  Architecture 


The  result  is  a  powerful  reconfigurable  computing  technology  [Figure  4],  which  can  be 
used  for  bitwise  operations  (typically  found  in  encoders,  CRCs,  scramblers...)  as  well  as 
for  byte  wise  operations  (decorrelators,  multiplexers,  etc).  The  LEON  processor  is  in 
charge  of  all  other  operations,  including  control,  synchronization  and  failure  detection 
and  correction. 


3.  THE  XPP  RECONFIGURABLE  COMPUTING  TECHNOLOGY 

The  extreme  Processor  Platfonn  (XPP)  from  PACT  [9]  is  a  new  data  processing 
architecture.  Its  processing  power  is  based  on  the  three  following  features,  each  of  which 
will  be  explained  in  detail  in  the  next  sections. 

The  XPP  Array:  The  heart  of  the  XPP  is  a  scalable  array  of  configurable  processing  array 
elements  (PAEs).  There  are  two  different  types  of  PAEs:  ALU-PAEs  and  RAM-PAEs. 
The  former  ones  perform  the  basic  computations  whereas  the  RAM-PAEs  are  used  for 
data  storage.  Important  elements  of  the  XPP  Architecture  are  the  I/O  Elements,  which 
connects  the  internal  processing  elements  to  external  RAMs  or  data  ports. 


The  packet-oriented  communications  network:  XPP  is  designed  to  simplify  the 
programming  task  and  to  allow  high  level  compilers  to  tap  the  full  parallel  potential  of 
the  XPP.  The  most  important  feature  to  support  this  is  the  packet  handling.  Unlike 
FPGAs,  which  transfer  data  strictly  from  register  to  register  with  each  clock  cycle,  XPP 
transfers  packets  of  data. 

Dynamic  Reconfiguration:  XPP  is  designed  to  allow  fast  reconfiguration  of  the  array.  In 
contrast  to  FPGAs,  which  require  configuration  memories  in  the  range  of  Mbits,  XPP 
needs  only  Kbits  for  a  full  configuration.  The  configuration  manager  loads  the  different 
configurations  into  the  array  according  to  a  programmed  sequence. 

3.1.  The  XPP  Array 

The  XPP  Array  in  the  current  implementation  of  the  XPP  technology  is  the  XPP64A1, 
which  is  shown  in  Figure  5.  It  has  64  ALU-PAEs  and  16  RAM-PAEs.  These  elements  are 
joined  by  event  and  data  channels. 
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Figure  5:  XPP64A1  Array  f9] 

1)  ALU-PAE 

In  Figure  6,  a  detailed  description  of  an  ALU-PAE  is  shown.  An  ALU-PAE  consists  of 
three  elements:  ALU,  FREG  and  BREG. 

ALU-Object:  The  ALU-Object  can  perform  arithmetical  and  logical  operations.  Each  of 
these  operation  codes  (opcodes)  requires  only  one  clock  cycle  for  its  execution. 


Figure  6:  ALU-PAE  [9] 

FREG  Object:  The  Forward  Register  provides  vertical  data  and  event  routing  channels  in 
the  top  to  bottom  direction  between  the  horizontal  channels.  A  small  ALU  is  also 
provided,  allowing  some  flow  control  and  counting  operations. 

BREG  Object:  The  backward  register  provides  vertical  data  and  event  routing  channels  in 
the  bottom  to  top  direction.  There  is  also  a  small  ALU  that  allows  some  easy  arithmetical 
operations.  The  BREG  permits  the  implementation  of  lookup  tables  using  the  values  on 
the  event  channels  as  references 

2)  RAM-PAE 

The  other  kind  of  processing  array  element  (PAE)  is  the  RAM-PAE,  whose  structure  is 
shown  in  Figure  7. 


Figure  7:  RAM-PAE  [9] 

The  XPP  Array  has  some  embedded  memories  in  the  array,  named  RAM  objects.  A 
RAM-Object  can  work  in  two  different  modes,  as  a  dual-ported  RAM  or  as  a  FIFO.  The 
routing  registers  FREG  and  BREG  have  the  same  functionality  as  in  an  ALU-PAE.  The 
embedded  memories  in  the  array  enable  high-speed  buffering,  data  storage  and  table 
lookup. 


3.2.  The  Packet-Oriented  Communication  Network 


The  most  important  feature  of  the  XPP  technology  is  the  packet-handling.  The  PAEs 
interchange  data  and  event  packets.  This  communication  is  carried  out  by  two  separate 
data  and  event  networks. 

Data  packets  contain  one  processor  word  (e.g.  24  bit)  and  are  created  at  the  output  of 
objects  as  soon  as  incoming  data  is  available.  From  there,  they  propagate  to  the  connected 
inputs.  If  more  than  one  input  is  connected  to  the  output,  the  packet  is  duplicated.  On  the 
other  hand,  an  XPP  Object  starts  its  calculation  only  if  all  required  input  packets  are 
available.  If  one  of  the  packets  has  not  arrived,  the  pipeline  stalls  until  the  packet  is 
processed.  This  is  illustrated  in  Fig.  8. 
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Events  are  also  handled  as  packets,  but  only  one  bit  width.  Events  originate  from  ALU 
operations  or  external  ports.  These  events  can  be  used  not  only  as  inputs  to  ALU 
functions  but  also  to  control  the  flow  of  data  packets  through  the  array.  Events  can  be 
also  combined  in  BREGs  objects  to  constitute  small  lookup  tables,  allowing  more 
complex  control  structures. 

3.3.  Dynamic  Reconfiguration 

Many  algorithms  are  too  complex  to  fit  into  the  array.  In  contrast  to  FPGAs,  which 
requires  configuration  information  sizes  in  the  order  of  Mbytes  and  long  configuration 
times  (usually  some  milliseconds),  the  size  of  a  configuration  in  XPP  is  only  Kbytes  and 
this  process  is  done  in  only  some  microseconds. 

Also  small  parts  of  the  array  can  be  reconfigured  without  the  need  to  stop  calculations  or 
other  configurations  on  the  same  array.  This  feature  is  known  as  partial  reconfiguration 
and  it  is  particularly  useful  to  change  constants  or  coefficients  on  a  given  algorithm. 

The  dynamic  reconfiguration  extends  the  instruction  flow  of  conventional 
microprocessors  to  the  configuration  flow  of  more  complex  algorithms,  like  shown  in 
Figure  9. 


Figure  9:  Reconfiguration  flow  in  XPP  [8] 


3.4.  Algorithm  Mapping 

An  algorithm  expressed  in  data-flow  graph  can  be  directly  mapped  onto  the  array.  The 
algorithm  is  coded  in  NML  (Native  Mapping  Language).  This  simple  descriptive 
language  exploits  all  the  XPP  features  without  overhead  and  allows  defining  sub- 
modules. 

There  is  also  a  Vectorizing  C-Compiler,  which  makes  extensive  use  of  optimized 
modules  from  a  library,  and  allows  the  use  of  a  subset  of  C  to  define  algorithms  for  the 
XPP. 

3.5.  An  Example  Application:  The  Wavelet  Transform 

The  regularity  and  parallelism  of  the  XPP  Architecture  is  quite  suited  for  matrix 
calculations  like  the  ones  that  are  usually  encountered  in  digital  signal  processing.  As  an 
example,  an  implementation  of  the  wavelet  transformations  for  the  lossless  and  lossy 
compression  of  the  JPEG  2000  Still  Image  Compression  Standard  was  carried  out  on  the 
XPP64A1  technology  [7]. 

The  execution  time  for  the  lossless  integer  transformation  was  in  the  order  of  30 
milliseconds,  whereas  for  the  lossy  floating  point  transformation  this  time  was  about  100 
milliseconds.  In  addition  to  the  fast  image  processing,  the  flexibility  of  the  XPP  PAEs  to 
to  work  with  floating  point  operations  was  also  demonstrated  during  this  study. 


4.  IMPLEMENTATION  OF  THE  RECONFIGURABLE  COMPUTING 
TECHNOLOGY  FOR  BAYERNSAT 

The  reconfigurable  computing  architecture  defined  in  section  2  will  be  implemented  for 
the  microsatellite  mission  BayemSAT.  Although  some  work  has  been  done  [6],  the  on- 
chip  integration  of  the  XPP  and  the  LEON  would  possibly  saturate  the  AMBA  system 


bus,  due  to  the  configuration  information  for  the  XPP  being  transferred  over  the  same  bus 
as  the  data.  In  addition  to  that,  currently  radiation-hardened  reprogrammable  FPGAs  do 
not  have  enough  size  to  allocate  the  LEON  Processor,  peripherals  and  the  other 
configurable  logic. 

For  these  reasons,  the  solution  proposed  for  BayemSAT  will  be  based  on  the  one  shown 
in  Fig.  10.  The  current  implementation  of  the  XPP  Technology,  the  XPP64A1  is  placed 
on  the  board  (180  nm  CMOS  silicon  technology  chip  from  ST  Microelectronics).  This 
technology  can  be  susceptible  to  latch  ups.  To  avoid  destruction  of  the  device  by  this 
effect,  a  latch  up  current  breaker  will  be  implemented. 

The  configuration  of  the  XPP  will  be  managed  by  a  Configuration  Manager  (CM), 
implemented  on  a  radiation-hardened  FPGA.  The  CM  will  hold  the  last  configuration 
information  stored  on  the  FLASH  memory  and  will  use  a  separate  bus  to  configure  the 
XPP,  the  CBUS.  Possible  updates  of  the  XPP  configuration  information  can  be 
transferred  from  the  Space  Wire  interface  to  the  memory  controller  on  the  FPGA  and  be 
stored  on  FLASH. 
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Figure  10:  Image  Processing  System.  Functional  block  description 


The  LEON,  the  bus  system  and  the  BPE  Encoding  processor  will  be  implemented  on  a 
radiation-tolerant,  reconfigurable  FPGA.  The  XPP  will  access  the  AMBA-Bus  system 
using  a  parallel  interface  connected  through  the  APB/AHB  Bridge. 


5.  CONCLUSION 


The  use  of  configurable  processors  is  a  cost  effective  solution  to  solve  the  lack  of 
powerful  space-hardened  processors  due  the  shrinkage  of  the  mark.  A  configurable 
processor  consists  of  a  microprocessor,  memory,  a  system  bus  and  reconfigurable  logic. 

In  this  paper,  a  configurable  processor,  based  on  the  LEON  +  AMBA  bus  platform,  for 
the  real  time  image  compression  in  BayernSAT  was  proposed.  This  satellite  aims  to 
demonstrate  the  telepresence  technology  by  transmitting  pictures  of  the  earth  in  real-time 
over  a  relay  satellite.  The  compressed  output  stream  is  encoded  according  to  the  CCSDS 
standard  for  image  compression. 

For  the  implementation  of  the  algorithm,  two  different  reconfigurable  logic  units  were 
defined.  The  Bit  Plane  Encoder  will  be  implemented  using  FPGA  logic.  The  Wavelet 
Transform  will  be  ported  to  the  new  XPP  reconfigurable  technology,  which  has  shown 
superior  performance  when  handling  with  data  word  based  algorithms. 

The  XPP  Computing  Technology  is  a  European  runtime  reconfigurable  technology 
perfectly  suited  for  image  and  video  processing.  Its  on-orbit  validation  will  be  performed 
during  the  BayernSAT  mission,  an  initiative  of  the  Technische  Universitat  Mime  hen, 
scheduled  for  launch  in  2008. 
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